How to best tell if a video is click-bait or not?
We chose this topic because we like watching YouTube videos. A big issue with finding a Youtube you enjoy is avoiding clickbait videos. When you click on one, you waste time realizing its a clickbait video and then have to search for what you want again. There are many features of a video you can look at to tell that it is a clickbait video without having to watch part of the video. In this analysis, we will identify these features.
library(dplyr)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Attaching package: 㤼㸱dplyr㤼㸲
The following objects are masked from 㤼㸱package:stats㤼㸲:
filter, lag
The following objects are masked from 㤼㸱package:base㤼㸲:
intersect, setdiff, setequal, union
library(ggplot2)
library(stringr)
library(tidyverse)
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[30m[32m<U+2713>[30m [34mtibble [30m 2.1.3 [32m<U+2713>[30m [34mreadr [30m 1.3.1
[32m<U+2713>[30m [34mtidyr [30m 1.0.0 [32m<U+2713>[30m [34mpurrr [30m 0.3.2
[32m<U+2713>[30m [34mtibble [30m 2.1.3 [32m<U+2713>[30m [34mforcats[30m 0.4.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(stringi)
library(qdap)
Loading required package: qdapDictionaries
Loading required package: qdapRegex
Attaching package: 㤼㸱qdapRegex㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
%+%
The following object is masked from 㤼㸱package:dplyr㤼㸲:
explain
Loading required package: qdapTools
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Attaching package: 㤼㸱qdapTools㤼㸲
The following object is masked from 㤼㸱package:dplyr㤼㸲:
id
Loading required package: RColorBrewer
Registered S3 methods overwritten by 'qdap':
method from
t.DocumentTermMatrix tm
t.TermDocumentMatrix tm
Attaching package: 㤼㸱qdap㤼㸲
The following object is masked from 㤼㸱package:forcats㤼㸲:
%>%
The following object is masked from 㤼㸱package:purrr㤼㸲:
%>%
The following object is masked from 㤼㸱package:tidyr㤼㸲:
%>%
The following object is masked from 㤼㸱package:stringr㤼㸲:
%>%
The following object is masked from 㤼㸱package:dplyr㤼㸲:
%>%
The following object is masked from 㤼㸱package:base㤼㸲:
Filter
library(SentimentAnalysis)
Attaching package: 㤼㸱SentimentAnalysis㤼㸲
The following object is masked from 㤼㸱package:base㤼㸲:
write
Here we load and get our data ready for analysis. The first dataset is from scraping YouTube videos and taking all features off it. We took title, likes, dislikes, views, description, comments, comment count, and if its a clickbait video or not. The second data set is a kaggle data set with many more titles so we can analyize the specific titles of videos better. The reasoning behind wanting more titles is because that is the easiest feature to look at to tell if a video is clickbait so we wanted to look at that further.
ClickbaitData <- read.csv("dataset.csv") #import 1
ClickbaitData2 <- read.csv("clickBait_Data.csv") #import 2
ClickbaitData2 <- ClickbaitData2 %>%
mutate(clickbait = recode(clickbait, `0` = "False", `1` = "True")) %>%
rename(class = clickbait) %>%
rename(title = titles)
JoinedData <- merge(x = ClickbaitData, y = ClickbaitData2, by = "title", all = TRUE)
JoinedData %>%
head()